Principles Of Statistics
Principles Of Statistics
A1.12 B1.15
Part II, 2001 comment(i) What are the main approaches by which prior distributions are specified in Bayesian inference?
Define the risk function of a decision rule . Given a prior distribution, define what is meant by a Bayes decision rule and explain how this is obtained from the posterior distribution.
(ii) Dashing late into King's Cross, I discover that Harry must have already boarded the Hogwarts Express. I must therefore make my own way onto platform nine and threequarters. Unusually, there are two guards on duty, and I will ask one of them for directions. It is safe to assume that one guard is a Wizard, who will certainly be able to direct me, and the other a Muggle, who will certainly not. But which is which? Before choosing one of them to ask for directions to platform nine and three-quarters, I have just enough time to ask one of them "Are you a Wizard?", and on the basis of their answer I must make my choice of which guard to ask for directions. I know that a Wizard will answer this question truthfully, but that a Muggle will, with probability , answer it untruthfully.
Failure to catch the Hogwarts Express results in a loss which I measure as 1000 galleons, there being no loss associated with catching up with Harry on the train.
Write down an exhaustive set of non-randomised decision rules for my problem and, by drawing the associated risk set, determine my minimax decision rule.
My prior probability is that the guard I ask "Are you a Wizard?" is indeed a Wizard. What is my Bayes decision rule?
A2.11 B2.16
Part II, 2001 comment(i) Let be independent, identically-distributed random variables, .
Find a minimal sufficient statistic for .
Let and . Write down the distribution of , and hence show that is ancillary. Explain briefly why the Conditionality Principle would lead to inference about being drawn from the conditional distribution of given .
What is the maximum likelihood estimator of ?
(ii) Describe briefly the Bayesian approach to predictive inference,
Let be independent, identically-distributed random variables, with both unknown. Derive the maximum likelihood estimators of based on , and state, without proof, their joint distribution.
Suppose that it is required to construct a prediction interval
for a future, independent, random variable with the same distribution, such that
with the probability over the joint distribution of . Let
where , and , with the distribution function of .
Show that .
By considering the distribution of , or otherwise, show that
and show how to construct an interval with
[Hint: if has the -distribution with degrees of freedom and is defined by then for .]
A3.12 B3.15
Part II, 2001 comment(i) Explain what is meant by a uniformly most powerful unbiased test of a null hypothesis against an alternative.
Let be independent, identically distributed random variables, with known. Explain how to construct a uniformly most powerful unbiased size test of the null hypothesis that against the alternative that .
(ii) Outline briefly the Bayesian approach to hypothesis testing based on Bayes factors.
Let the distribution of be as in (i) above, and suppose we wish to test, as in (i), against the alternative . Suppose we assume a prior for under the alternative. Find the form of the Bayes factor , and show that, for fixed as .
A4.13 B4.15
Part II, 2001 commentWrite an account, with appropriate examples, of one of the following:
(a) Inference in multi-parameter exponential families;
(b) Asymptotic properties of maximum-likelihood estimators and their use in hypothesis testing;
(c) Bootstrap inference.
A1.12 B1.15
Part II, 2002 comment(i) Explain in detail the minimax and Bayes principles of decision theory.
Show that if is a Bayes decision rule for a prior density and has constant risk function, then is minimax.
(ii) Let be independent random variables, with .
Consider estimating by , with loss function
What is the risk function of
Consider the class of estimators of of the form
indexed by . Find the risk function of in terms of , which you should not attempt to evaluate, and deduce that is inadmissible. What is optimal value of ?
[You may assume Stein's Lemma, that for suitably behaved real-valued functions ,
A2.11 B2.16
Part II, 2002 comment(i) Let be a random variable with density function . Consider testing the simple null hypothesis against the simple alternative hypothesis .
What is the form of the optimal size classical hypothesis test?
Compare the form of the test with the Bayesian test based on the Bayes factor, and with the Bayes decision rule under the 0-1 loss function, under which a loss of 1 is incurred for an incorrect decision and a loss of 0 is incurred for a correct decision.
(ii) What does it mean to say that a family of densities with real scalar parameter is of monotone likelihood ratio?
Suppose has a distribution from a family which is of monotone likelihood ratio with respect to a statistic and that it is required to test against .
State, without proof, a theorem which establishes the existence of a uniformly most powerful test and describe in detail the form of the test.
Let be independent, identically distributed . Find a uniformly most powerful size test of against , and find its power function. Show that we may construct a different, randomised, size test with the same power function for .
A3.12 B3.15
Part II, 2002 comment(i) Describe in detail how to perform the Wald, score and likelihood ratio tests of a simple null hypothesis given a random sample from a regular oneparameter density . In each case you should specify the asymptotic null distribution of the test statistic.
(ii) Let be an independent, identically distributed sample from a distribution , and let be an estimator of a parameter of .
Explain what is meant by: (a) the empirical distribution function of the sample; (b) the bootstrap estimator of the bias of , based on the empirical distribution function. Explain how a bootstrap estimator of the distribution function of may be used to construct an approximate confidence interval for .
Suppose the parameter of interest is , where is the mean of , and the estimator is , where is the sample mean.
Derive an explicit expression for the bootstrap estimator of the bias of and show that it is biased as an estimator of the true bias of .
Let be the value of the estimator computed from the sample of size obtained by deleting and let . The jackknife estimator of the bias of is
Derive the jackknife estimator for the case , and show that, as an estimator of the true bias of , it is unbiased.
A4.13 B4.15
Part II, 2002 comment(a) Let be independent, identically distributed random variables from a one-parameter distribution with density function
Explain in detail how you would test
What is the general form of a conjugate prior density for in a Bayesian analysis of this distribution?
(b) Let be independent Poisson random variables, with means and respectively, with known.
Explain why the Conditionality Principle leads to inference about being drawn from the conditional distribution of , given . What is this conditional distribution?
(c) Suppose have distributions as in (b), but that is now unknown.
Explain in detail how you would test against , and describe the optimality properties of your test.
[Any general results you use should be stated clearly, but need not be proved.]
A1.12 B1.15
Part II, 2003 comment(i) A public health official is seeking a rational policy of vaccination against a relatively mild ailment which causes absence from work. Surveys suggest that of the population are already immune, but accurate tests to detect vulnerability in any individual are too costly for mass screening. A simple skin test has been developed, but is not completely reliable. A person who is immune to the ailment will have a negligible reaction to the skin test with probability , a moderate reaction with probability and a strong reaction with probability 0.1. For a person who is vulnerable to the ailment the corresponding probabilities are and . It is estimated that the money-equivalent of workhours lost from failing to vaccinate a vulnerable person is 20 , that the unnecessary cost of vaccinating an immune person is 8 , and that there is no cost associated with vaccinating a vulnerable person or failing to vaccinate an immune person. On the basis of the skin test, it must be decided whether to vaccinate or not. What is the Bayes decision rule that the health official should adopt?
(ii) A collection of students each sit exams. The ability of the th student is represented by and the performance of the th student on the th exam is measured by . Assume that, given , an appropriate model is that the variables are independent, and
for a known positive constant . It is reasonable to assume, a priori, that the are independent with
where and are population parameters, known from experience with previous cohorts of students.
Compute the posterior distribution of given the observed exam marks vector
Suppose now that is also unknown, but assumed to have a distribution, for known . Compute the posterior distribution of given and Find, up to a normalisation constant, the form of the marginal density of given .
A2.11 B2.16
Part II, 2003 comment(i) Outline briefly the Bayesian approach to hypothesis testing based on Bayes factors.
(ii) Let be independent random variables, both uniformly distributed on . Find a minimal sufficient statistic for . Let , . Show that is ancilliary and explain why the Conditionality Principle would lead to inference about being drawn from the conditional distribution of given . Find the form of this conditional distribution.
A3.12 B3.15
Part II, 2003 comment(i) Let be independent, identically distributed random variables, with the exponential density .
Obtain the maximum likelihood estimator of . What is the asymptotic distribution of ?
What is the minimum variance unbiased estimator of Justify your answer carefully.
(ii) Explain briefly what is meant by the profile log-likelihood for a scalar parameter of interest , in the presence of a nuisance parameter . Describe how you would test a null hypothesis of the form using the profile log-likelihood ratio statistic.
In a reliability study, lifetimes are independent and exponentially distributed, with means of the form where are unknown and are known constants. Inference is required for the mean lifetime, , for covariate value .
Find, as explicitly as possible, the profile log-likelihood for , with nuisance parameter .
Show that, under , the profile likelihood ratio statistic has a distribution which does not depend on the value of . How might the parametric bootstrap be used to obtain a test of of exact size ?
[Hint: if is exponentially distributed with mean 1 , then is exponentially distributed with mean .]
A4.13 B4.15
Part II, 2003 commentWrite an account, with appropriate examples, of inference in multiparameter exponential families. Your account should include a discussion of natural statistics and their properties and of various conditional tests on natural parameters.
A1.12 B1.15
Part II, 2004 comment(i) What does it mean to say that a family of densities is an exponential family?
Consider the family of densities on parametrised by the positive parameters and defined by
Prove that this family is an exponential family, and identify the natural parameters and the reference measure.
(ii) Let be a sample drawn from the above distribution. Find the maximum-likelihood estimators of the parameters . Find the Fisher information matrix of the family (in terms of the natural parameters). Briefly explain the significance of the Fisher information matrix in relation to unbiased estimation. Compute the mean of and of .
A2.11 B2.16
Part II, 2004 comment(i) In the context of a decision-theoretic approach to statistics, what is a loss function? a decision rule? the risk function of a decision rule? the Bayes risk of a decision rule? the Bayes rule with respect to a given prior distribution?
Show how the Bayes rule with respect to a given prior distribution is computed.
(ii) A sample of people is to be tested for the presence of a certain condition. A single real-valued observation is made on each one; this observation comes from density if the condition is absent, and from density if the condition is present. Suppose if the person does not have the condition, otherwise, and suppose that the prior distribution for the is that they are independent with common distribution , where is known. If denotes the observation made on the person, what is the posterior distribution of the ?
Now suppose that the loss function is defined by
for action , where are positive constants. If denotes the posterior probability that given the data, prove that the Bayes rule for this prior and this loss function is to take if exceeds the threshold value , and otherwise to take .
In an attempt to control the proportion of false positives, it is proposed to use a different loss function, namely,
where . Prove that the Bayes rule is once again a threshold rule, that is, we take action if and only if , and determine as fully as you can.
A3.12 B3.15
Part II, 2004 comment(i) What is a sufficient statistic? What is a minimal sufficient statistic? Explain the terms nuisance parameter and ancillary statistic.
(ii) Let be independent random variables with common uniform( distribution, and suppose you observe , where the positive parameters are unknown. Write down the joint density of and prove that the statistic
is minimal sufficient for . Find the maximum-likelihood estimator of .
Regarding as the parameter of interest and as the nuisance parameter, is ancillary? Find the mean and variance of . Hence find an unbiased estimator of .
A4.13 B4.15
Part II, 2004 commentSuppose that is the parameter of a non-degenerate exponential family. Derive the asymptotic distribution of the maximum-likelihood estimator of based on a sample of size . [You may assume that the density is infinitely differentiable with respect to the parameter, and that differentiation with respect to the parameter commutes with integration.]
Part II 2004
1.II.27I
Part II, 2005 commentState Wilks' Theorem on the asymptotic distribution of likelihood-ratio test statistics.
Suppose that are independent with common distribution, where the parameters and are both unknown. Find the likelihood-ratio test statistic for testing against unrestricted, and state its (approximate) distribution.
What is the form of the -test of against ? Explain why for large the likelihood-ratio test and the -test are nearly the same.
2.II.27I
Part II, 2005 comment(i) Suppose that is a multivariate normal vector with mean and covariance matrix , where and are both unknown, and denotes the identity matrix. Suppose that are linear subspaces of of dimensions and , where . Let denote orthogonal projection onto . Carefully derive the joint distribution of under the hypothesis . How could you use this to make a test of against ?
(ii) Suppose that students take exams, and that the mark of student in exam is modelled as
where , the are independent , and the parameters and are unknown. Construct a test of for all against .
3.II.26I
Part II, 2005 commentIn the context of decision theory, explain the meaning of the following italicized terms: loss function, decision rule, the risk of a decision rule, a Bayes rule with respect to prior , and an admissible rule. Explain how a Bayes rule with respect to a prior can be constructed.
Suppose that are independent with common distribution, where is supposed to have a prior density . In a decision-theoretic approach to estimating , we take a quadratic loss: . Write and .
By considering decision rules (estimators) of the form , prove that if then the estimator is not Bayes, for any choice of prior .
By considering decision rules of the form , prove that if then the estimator is not Bayes, for any choice of prior .
[You may use without proof the fact that, if has a distribution, then .]
4.II.27I
Part II, 2005 commentA group of hospitals is to be 'appraised'; the 'performance' of hospital has a prior distribution, different hospitals being independent. The 'performance' cannot be measured directly, so an expensive firm of management consultants has been hired to arrive at each hospital's Standardised Index of Quality [SIQ], this being a number for hospital related to by the commercially-sensitive formula
where the are independent with common distribution.
(i) Assume that and are known. What is the posterior distribution of given ? Suppose that hospital was the hospital with the lowest SIQ, with a value ; conditional on , what is the distribution of ?
(ii) Now, instead of assuming and known, suppose that has a Gamma prior with parameters , density
for known and , and that , where is a known constant. Find the posterior distribution of given . Comment briefly on the form of the distribution.
1.II
Part II, 2006 comment(a) What is a loss function? What is a decision rule? What is the risk function of a decision rule? What is the Bayes risk of a decision rule with respect to a prior ?
(b) Let denote the risk function of decision rule , and let denote the Bayes risk of decision rule with respect to prior . Suppose that is a decision rule and is a prior over the parameter space with the two properties
(i)
(ii) .
Prove that is minimax.
(c) Suppose now that , where is the space of possible actions, and that the loss function is
where is a positive constant. If the law of the observation given parameter is , where is known, show (using (b) or otherwise) that the rule
is minimax.
2.II.27J
Part II, 2006 commentLet be a parametric family of densities for observation . What does it mean to say that the statistic is sufficient for ? What does it mean to say that is minimal sufficient?
State the Rao-Blackwell theorem. State the Cramér-Rao lower bound for the variance of an unbiased estimator of a (scalar) parameter, taking care to specify any assumptions needed.
Let be a sample from a distribution, where the positive parameter is unknown. Find a minimal sufficient statistic for . If is an unbiased estimator for , find the form of , and deduce that this estimator is minimum-variance unbiased. Would it be possible to reach this conclusion using the Cramér-Rao lower bound?
3.II.26J
Part II, 2006 commentWrite an essay on the rôle of the Metropolis-Hastings algorithm in computational Bayesian inference on a parametric model. You may for simplicity assume that the parameter space is finite. Your essay should:
(a) explain what problem in Bayesian inference the Metropolis-Hastings algorithm is used to tackle;
(b) fully justify that the algorithm does indeed deliver the required information about the model;
(c) discuss any implementational issues that need care.
4.II.27J
Part II, 2006 comment(a) State the strong law of large numbers. State the central limit theorem.
(b) Assuming whatever regularity conditions you require, show that if is the maximum-likelihood estimator of the unknown parameter based on an independent identically distributed sample of size , then under
as , where is a matrix which you should identify. A rigorous derivation is not required.
(c) Suppose that are independent binomial random variables. It is required to test against the alternative . Show that the construction of a likelihood-ratio test leads us to the statistic
where . Stating clearly any result to which you appeal, for large , what approximately is the distribution of under ? Writing , and assuming that is small, show that
Using this and the central limit theorem, briefly justify the approximate distribution of given by asymptotic maximum-likelihood theory. What could you say if the assumption that is small failed?
1.II.27I
Part II, 2007 commentSuppose that has density where . What does it mean to say that statistic is sufficient for ?
Suppose that , where is the parameter of interest, and is a nuisance parameter, and that the sufficient statistic has the form . What does it mean to say that the statistic is ancillary? If it is, how (according to the conditionality principle) do we test hypotheses on Assuming that the set of possible values for is discrete, show that is ancillary if and only if the density (probability mass function) factorises as
for some functions , and with the properties
for all , and .
Suppose now that are independent observations from a distribution, with density
Assuming that the criterion (*) holds also for observations which are not discrete, show that it is not possible to find sufficient for such that is ancillary when is regarded as a nuisance parameter, and is the parameter of interest.
2.II.27I
Part II, 2007 comment(i) State Wilks' likelihood ratio test of the null hypothesis against the alternative , where . Explain when this test may be used.
(ii) Independent identically-distributed observations take values in the set , with common distribution which under the null hypothesis is of the form
for some , where is an open subset of some Euclidean space , . Under the alternative hypothesis, the probability mass function of the is unrestricted.
Assuming sufficient regularity conditions on to guarantee the existence and uniqueness of a maximum-likelihood estimator of for each , show that for large the Wilks' likelihood ratio test statistic is approximately of the form
where , and . What is the asymptotic distribution of this statistic?
3.II.26I
Part II, 2007 comment(i) In the context of decision theory, what is a Bayes rule with respect to a given loss function and prior? What is an extended Bayes rule?
Characterise the Bayes rule with respect to a given prior in terms of the posterior distribution for the parameter given the observation. When for some , and the loss function is , what is the Bayes rule?
(ii) Suppose that , with loss function and suppose further that under .
Supposing that a prior is taken over , compute the Bayes risk of the decision rule . Find the posterior distribution of given , and confirm that its mean is of the form for some value of which you should identify. Hence show that the decision rule is an extended Bayes rule.
4.II.27I
Part II, 2007 commentAssuming sufficient regularity conditions on the likelihood for a univariate parameter , establish the Cramér-Rao lower bound for the variance of an unbiased estimator of .
If is an unbiased estimator of whose variance attains the Cramér-Rao lower bound for every value of , show that the likelihood function is an exponential family.
1.II.27I
Part II, 2008 commentAn angler starts fishing at time 0. Fish bite in a Poisson Process of rate per hour, so that, if , the number of fish he catches in the first hours has the Poisson distribution , while , the time in hours until his th bite, has the Gamma distribution , with density function
Bystander plans to watch for 3 hours, and to record the number of fish caught. Bystander plans to observe until the 10 th bite, and to record , the number of hours until this occurs.
For , show that is an unbiased estimator of whose variance function achieves the Cramér-Rao lower bound
Find an unbiased estimator of for , of the form . Does it achieve the Cramér-Rao lower bound? Is it minimum-variance-unbiased? Justify your answers.
In fact, the 10 th fish bites after exactly 3 hours. For each of and , write down the likelihood function for based their observations. What does the Likelihood Principle have to say about the inferences to be drawn by and , and why? Compute the estimates and produced by applying and to the observed data. Does the method of minimum-variance-unbiased estimation respect the Likelihood Principle?
2.II.27I
Part II, 2008 commentUnder hypothesis , a real-valued observable , taking values in , has density function . Define the Type I error and the Type II error of a test of the null hypothesis against the alternative hypothesis . What are the size and power of the test in terms of and ?
Show that, for minimises among all possible tests if and only if it satisfies
What does this imply about the admissibility of such a test?
Given the value of a parameter variable , the observable has density function
For fixed , describe all the likelihood ratio tests of against .
For fixed , let be the test that rejects if and only if . Is admissible as a test of against for every ? Is it uniformly most powerful for its size for testing against the composite hypothesis ? Is it admissible as a test of against ?
3.II.26I
Part II, 2008 commentDefine the notion of exponential family , and show that, for data arising as a random sample of size from an exponential family, there exists a sufficient statistic whose dimension stays bounded as .
The log-density of a normal distribution can be expressed in the form
where is the value of an unknown parameter . Determine the function , and the natural parameter-space . What is the mean-value parameter in terms of
Determine the maximum likelihood estimator of based on a random sample , and give its asymptotic distribution for .
How would these answers be affected if the variance of were known to have value ?
4.II.27I
Part II, 2008 commentDefine sufficient statistic, and state the factorisation criterion for determining whether a statistic is sufficient. Show that a Bayesian posterior distribution depends on the data only through the value of a sufficient statistic.
Given the value of an unknown parameter , observables are independent and identically distributed with distribution . Show that the statistic is sufficient for .
If the prior distribution is , determine the posterior distribution of and the predictive distribution of .
In fact, there are two hypotheses as to the value of M. Under hypothesis , takes the known value 0 ; under is unknown, with prior distribution . Explain why the Bayes factor for choosing between and depends only on , and determine its value for data .
The frequentist -level test of against rejects when . What is the Bayes factor for the critical case ? How does this behave as ? Comment on the similarities or differences in behaviour between the frequentist and Bayesian tests.
Paper 3, Section II, I
Part II, 2009 commentWhat is meant by an equaliser decision rule? What is meant by an extended Bayes rule? Show that a decision rule that is both an equaliser rule and extended Bayes is minimax.
Let be independent and identically distributed random variables with the normal distribution , and let . It is desired to estimate with loss function .
Suppose the prior distribution is . Find the Bayes act and the Bayes loss posterior to observing . What is the Bayes risk of the Bayes rule with respect to this prior distribution?
Show that the rule that estimates by is minimax.
Paper 4, Section II, I
Part II, 2009 commentConsider the double dichotomy, where the loss is 0 for a correct decision and 1 for an incorrect decision. Describe the form of a Bayes decision rule. Assuming the equivalence of normal and extensive form analyses, deduce the Neyman-Pearson lemma.
For a problem with random variable and real parameter , define monotone likelihood ratio (MLR) and monotone test.
Suppose the problem has MLR in a real statistic . Let be a monotone test, with power function , and let be any other test, with power function . Show that if and , then . Deduce that there exists such that for , and for .
For an arbitrary prior distribution with density , and an arbitrary value , show that the posterior odds
is a non-decreasing function of .
Paper 1, Section II, I
Part II, 2009 comment(i) Let be independent and identically distributed random variables, having the exponential distribution with density for . Show that is minimal sufficient and complete for .
[You may assume uniqueness of Laplace transforms.]
(ii) For given , it is desired to estimate the quantity . Compute the Fisher information for .
(iii) State the Lehmann-Scheffé theorem. Show that the estimator of defined by
is the minimum variance unbiased estimator of based on . Without doing any computations, state whether or not the variance of achieves the Cramér-Rao lower bound, justifying your answer briefly.
Let . Show that .
Paper 2, Section II, I
Part II, 2009 commentSuppose that the random vector has a distribution over depending on a real parameter , with everywhere positive density function . Define the maximum likelihood estimator , the score variable , the observed information and the expected (Fisher) information for the problem of estimating from .
For the case where the are independent and identically distributed, show that, as . [You may assume sufficient conditions to allow interchange of integration over the sample space and differentiation with respect to the parameter.] State the asymptotic distribution of .
The random vector is generated according to the rule
where and the are independent and identically distributed from the standard normal distribution . Write down the likelihood function for based on data , find and and show that the pair forms a minimal sufficient statistic.
A Bayesian uses the improper prior density . Show that, in the posterior, (where is a statistic that you should identify) has the same distribution as .
Paper 1, Section II, J
Part II, 2010 commentThe distribution of a random variable is obtained from the binomial distribution by conditioning on ; here is an unknown probability parameter and is known. Show that the distributions of form an exponential family and identify the natural sufficient statistic , natural parameter , and cumulant function . Using general properties of the cumulant function, compute the mean and variance of when . Write down an equation for the maximum likelihood estimate of and explain why, when , the distribution of is approximately normal for large .
Suppose we observe . It is suggested that, since the condition is then automatically satisfied, general principles of inference require that the inference to be drawn should be the same as if the distribution of had been and we had observed . Comment briefly on this suggestion.
Paper 2, Section II, J
Part II, 2010 commentDefine the Kolmogorov-Smirnov statistic for testing the null hypothesis that real random variables are independently and identically distributed with specified continuous, strictly increasing distribution function , and show that its null distribution does not depend on .
A composite hypothesis specifies that, when the unknown positive parameter takes value , the random variables arise independently from the uniform distribution . Letting , show that, under , the statistic is sufficient for . Show further that, given , the random variables are independent and have the distribution. How might you apply the Kolmogorov-Smirnov test to test the hypothesis ?
Paper 3, Section II,
Part II, 2010 commentDefine the normal and extensive form solutions of a Bayesian statistical decision problem involving parameter , random variable , and loss function . How are they related? Let be the Bayes loss of the optimal act when and no data can be observed. Express the Bayes risk of the optimal statistical decision rule in terms of and the joint distribution of .
The real parameter has distribution , having probability density function . Consider the problem of specifying a set such that the loss when is , where is the indicator function of , where , and where . Show that the "highest density" region supplies a Bayes act for this decision problem, and explain why .
For the case , find an expression for in terms of the standard normal distribution function .
Suppose now that , that and that . Show that .
Paper 4, Section II,
Part II, 2010 commentDefine completeness and bounded completeness of a statistic in a statistical experiment.
Random variables are generated as , where are independently standard normal , and the parameter takes values in . What is the joint distribution of when ? Write down its density function, and show that a minimal sufficient statistic for based on is .
[Hint: You may use that if is the identity matrix and is the matrix all of whose entries are 1 , then has determinant , and inverse with .]
What is Is complete for
Let . Show that is a positive constant which does not depend on , but that is not identically equal to . Is boundedly complete for ?
Paper 1, Section II, K
Part II, 2011 commentDefine admissible, Bayes, minimax decision rules.
A random vector has independent components, where has the normal distribution when the parameter vector takes the value . It is required to estimate by a point , with loss function . What is the risk function of the maximum-likelihood estimator Show that is dominated by the estimator .
Paper 2, Section II, K
Part II, 2011 commentRandom variables are independent and identically distributed from the normal distribution with unknown mean and unknown precision (inverse variance) . Show that the likelihood function, for data , is
where and .
A bivariate prior distribution for is specified, in terms of hyperparameters , as follows. The marginal distribution of is , with density
and the conditional distribution of , given , is normal with mean and precision .
Show that the conditional prior distribution of , given , is
Show that the posterior joint distribution of , given , has the same form as the prior, with updated hyperparameters which you should express in terms of the prior hyperparameters and the data.
[You may use the identity
where and .]
Explain how you could implement Gibbs sampling to generate a random sample from the posterior joint distribution.
Paper 3, Section II,
Part II, 2011 commentRandom variables are independent and identically distributed from the exponential distribution , with density function
when the parameter takes value . The following experiment is performed. First is observed. Thereafter, if have been observed , a coin having probability of landing heads is tossed, where is a known function and the coin toss is independent of the 's and previous tosses. If it lands heads, no further observations are made; if tails, is observed.
Let be the total number of 's observed, and . Write down the likelihood function for based on data , and identify a minimal sufficient statistic. What does the likelihood principle have to say about inference from this experiment?
Now consider the experiment that only records . Show that the density function of has the form
Assuming the function is twice differentiable and that both and vanish at 0 and , show that is an unbiased estimator of , and find its variance.
Stating clearly any general results you use, deduce that
Paper 4, Section II, K
Part II, 2011 commentWhat does it mean to say that a random vector has a multivariate normal distribution?
Suppose has the bivariate normal distribution with mean vector , and dispersion matrix
Show that, with is independent of , and thus that the conditional distribution of given is normal with mean and variance .
For are independent and identically distributed with the above distribution, where all elements of and are unknown. Let
where .
The sample correlation coefficient is . Show that the distribution of depends only on the population correlation coefficient .
Student's -statistic (on degrees of freedom) for testing the null hypothesis is
where and . Its density when is true is
where is a constant that need not be specified.
Express in terms of , and hence derive the density of when .
How could you use the sample correlation to test the hypothesis ?
Paper 4, Section II, K
Part II, 2012 commentFor , the pairs have independent bivariate normal distributions, with , and . The means are known; the parameters and are unknown.
Show that the joint distribution of all the variables belongs to an exponential family, and identify the natural sufficient statistic, natural parameter, and mean-value parameter. Hence or otherwise, find the maximum likelihood estimator of .
Let . What is the joint distribution of
Show that the distribution of
is . Hence describe a -level confidence interval for . Briefly explain what would change if and were also unknown.
[Recall that the distribution is that of , where, independently for and has the chi-squared distribution with degrees of freedom.]
Paper 3, Section II,
Part II, 2012 commentThe parameter vector is , with . Given , the integer random vector has a trinomial distribution, with probability mass function
Compute the score vector for the parameter , and, quoting any relevant general result, use this to determine .
Considering (1) as an exponential family with mean-value parameter , what is the corresponding natural parameter ?
Compute the information matrix for , which has -entry
where denotes the log-likelihood function, based on , expressed in terms of .
Show that the variance of is asymptotic to as . [Hint. The information matrix for is and the dispersion matrix of the maximum likelihood estimator behaves, asymptotically (for ) as .]
Paper 2, Section II,
Part II, 2012 commentCarefully defining all italicised terms, show that, if a sufficiently general method of inference respects both the Weak Sufficiency Principle and the Conditionality Principle, then it respects the Likelihood Principle.
The position of a particle at time has the Normal distribution , where is the value of an unknown parameter ; and the time, , at which the particle first reaches position has probability density function
Experimenter observes , and experimenter observes , where are fixed in advance. It turns out that . What does the Likelihood Principle say about the inferences about to be made by the two experimenters?
bases his inference about on the distribution and observed value of , while bases her inference on the distribution and observed value of . Show that these choices respect the Likelihood Principle.
Paper 1, Section II,
Part II, 2012 commentProve that, if is complete sufficient for , and is a function of , then is the minimum variance unbiased estimator of .
When the parameter takes a value , observables arise independently from the exponential distribution , having probability density function
Show that the family of distributions
with probability density function
is a conjugate family for Bayesian inference about (where is the Gamma function).
Show that the expectation of , under prior distribution (1), is , where . What is the prior variance of ? Deduce the posterior expectation and variance of , given .
Let denote the limiting form of the posterior expectation of as . Show that is the minimum variance unbiased estimator of . What is its variance?
Paper 4, Section II,
Part II, 2013 commentAssuming only the existence and properties of the univariate normal distribution, define , the multivariate normal distribution with mean (row-)vector and dispersion matrix ; and , the Wishart distribution on integer degrees of freedom and with scale parameter . Show that, if , and are fixed, then , where .
The random matrix has rows that are independently distributed as , where both parameters and are unknown. Let , where 1 is the vector of ; and , with . State the joint distribution of and given the parameters.
Now suppose and is positive definite. Hotelling's is defined as
where with . Show that, for any values of and ,
the distribution on and degrees of freedom.
[You may assume that:
- If and is a fixed vector, then
- If are independent, then
Paper 3, Section II, K
Part II, 2013 commentWhat is meant by a convex decision problem? State and prove a theorem to the effect that, in a convex decision problem, there is no point in randomising. [You may use standard terms without defining them.]
The sample space, parameter space and action space are each the two-point set . The observable takes value 1 with probability when the parameter , and with probability when . The loss function is 0 if , otherwise 1 . Describe all the non-randomised decision rules, compute their risk functions, and plot these as points in the unit square. Identify an inadmissible non-randomised decision rule, and a decision rule that dominates it.
Show that the minimax rule has risk function , and is Bayes against a prior distribution that you should specify. What is its Bayes risk? Would a Bayesian with this prior distribution be bound to use the minimax rule?
Paper 1, Section II, K
Part II, 2013 commentWhen the real parameter takes value , variables arise independently from a distribution having density function with respect to an underlying measure . Define the score variable and the information function for estimation of based on , and relate to .
State and prove the Cramér-Rao inequality for the variance of an unbiased estimator of . Under what conditions does this inequality become an equality? What is the form of the estimator in this case? [You may assume , and any further required regularity conditions, without comment.]
Let be the maximum likelihood estimator of based on . What is the asymptotic distribution of when ?
Suppose that, for each is unbiased for , and the variance of is exactly equal to its asymptotic variance. By considering the estimator , or otherwise, show that, for .
Paper 2, Section II, K
Part II, 2013 commentDescribe the Weak Sufficiency Principle (WSP) and the Strong Sufficiency Principle (SSP). Show that Bayesian inference with a fixed prior distribution respects WSP.
A parameter has a prior distribution which is normal with mean 0 and precision (inverse variance) Given , further parameters have independent normal distributions with mean and precision . Finally, given both and , observables are independent, being normal with mean , and precision . The precision parameters are all fixed and known. Let , where . Show, directly from the definition of sufficiency, that is sufficient for . [You may assume without proof that, if have independent normal distributions with the same variance, and , then the vector is independent of .]
For data-values , determine the joint distribution, say, of , given and What is the distribution of , given and
Using these results, describe clearly how Gibbs sampling combined with RaoBlackwellisation could be applied to estimate the posterior joint distribution of , given .
Paper 4, Section II, J
Part II, 2014 commentSuppose you have at hand a pseudo-random number generator that can simulate an i.i.d. sequence of uniform distributed random variables for any . Construct an algorithm to simulate an i.i.d. sequence of standard normal random variables. [Should your algorithm depend on the inverse of any cumulative probability distribution function, you are required to provide an explicit expression for this inverse function.]
Suppose as a matter of urgency you need to approximately evaluate the integral
Find an approximation of this integral that requires simulation steps from your pseudo-random number generator, and which has stochastic accuracy
where Pr denotes the joint law of the simulated random variables. Justify your answer.
Paper 3, Section II, J
Part II, 2014 commentState and prove Wilks' theorem about testing the simple hypothesis , against the alternative , in a one-dimensional regular parametric model . [You may use without proof the results from lectures on the consistency and asymptotic distribution of maximum likelihood estimators, as well as on uniform laws of large numbers. Necessary regularity conditions can be assumed without statement.]
Find the maximum likelihood estimator based on i.i.d. observations in a -model, . Deduce the limit distribution as of the sequence of statistics
where and are i.i.d. .
Paper 2, Section II, J
Part II, 2014 commentIn a general decision problem, define the concepts of a Bayes rule and of admissibility. Show that a unique Bayes rule is admissible.
Consider i.i.d. observations from a , model. Can the maximum likelihood estimator of be a Bayes rule for estimating in quadratic risk for any prior distribution on that has a continuous probability density on Justify your answer.
Now model the as i.i.d. copies of , where is drawn from a prior that is a Gamma distribution with parameters and (given below). Show that the posterior distribution of is a Gamma distribution and find its parameters. Find the Bayes rule for estimating in quadratic risk for this prior. [The Gamma probability density function with parameters is given by
where is the usual Gamma function.]
Finally assume that the have actually been generated from a fixed Poisson distribution, where . Show that converges to zero in probability and deduce the asymptotic distribution of under the joint law of the random variables . [You may use standard results from lectures without proof provided they are clearly stated.]
Paper 1, Section II, J
Part II, 2014 commentState without proof the inequality known as the Cramér-Rao lower bound in a parametric model . Give an example of a maximum likelihood estimator that attains this lower bound, and justify your answer.
Give an example of a parametric model where the maximum likelihood estimator based on observations is biased. State without proof an analogue of the Cramér-Rao inequality for biased estimators.
Define the concept of a minimax decision rule, and show that the maximum likelihood estimator based on in a model is minimax for estimating in quadratic risk.
Paper 4, Section II,
Part II, 2015 commentGiven independent and identically distributed observations with finite mean and variance , explain the notion of a bootstrap sample , and discuss how you can use it to construct a confidence interval for .
Suppose you can operate a random number generator that can simulate independent uniform random variables on . How can you use such a random number generator to simulate a bootstrap sample?
Suppose that and are cumulative probability distribution functions defined on the real line, that as for every , and that is continuous on . Show that, as ,
State (without proof) the theorem about the consistency of the bootstrap of the mean, and use it to give an asymptotic justification of the confidence interval . That is, prove that as where is the joint distribution of
[You may use standard facts of stochastic convergence and the Central Limit Theorem without proof.]
Paper 3, Section II, J
Part II, 2015 commentDefine what it means for an estimator of an unknown parameter to be consistent.
Let be a sequence of random real-valued continuous functions defined on such that, as converges to in probability for every , where is non-random. Suppose that for some and every we have
and that has exactly one zero for every . Show that as , and deduce from this that the maximum likelihood estimator (MLE) based on observations from a model is consistent.
Now consider independent observations of bivariate normal random vectors
where and is the identity matrix. Find the MLE of and show that the MLE of equals
Show that is not consistent for estimating . Explain briefly why the MLE fails in this model.
[You may use the Law of Large Numbers without proof.]
Paper 2, Section II, J
Part II, 2015 commentConsider a random variable arising from the binomial distribution , . Find the maximum likelihood estimator and the Fisher information for .
Now consider the following priors on :
(i) a uniform prior on ,
(ii) a prior with density proportional to ,
(iii) a prior.
Find the means and modes of the posterior distributions corresponding to the prior distributions (i)-(iii). Which of these posterior decision rules coincide with ? Which one is minimax for quadratic risk? Justify your answers.
[You may use the following properties of the distribution. Its density , is proportional to , its mean is equal to , and its mode is equal to
provided either or .
You may further use the fact that a unique Bayes rule of constant risk is a unique minimax rule for that risk.]
Paper 1, Section II, J
Part II, 2015 commentConsider a normally distributed random vector modelled as where is the identity matrix, and where . Define the Stein estimator of .
Prove that dominates the estimator for the risk function induced by quadratic loss
Show however that the worst case risks coincide, that is, show that
[You may use Stein's lemma without proof, provided it is clearly stated.]
Paper 3, Section II, J
Part II, 2016 commentLet be i.i.d. random variables from a distribution, , and consider a Bayesian model for the unknown parameter, where is a fixed constant.
(a) Derive the posterior distribution of .
(b) Construct a credible set such that
(i) for every , and
(ii) for any ,
where denotes the distribution of the infinite sequence when drawn independently from a fixed distribution.
[You may use the central limit theorem.]
Paper 2, Section II,
Part II, 2016 comment(a) State and prove the Cramér-Rao inequality in a parametric model , where . [Necessary regularity conditions on the model need not be specified.]
(b) Let be i.i.d. Poisson random variables with unknown parameter . For and define
Show that for all values of .
Now suppose is an estimator of with possibly nonzero bias . Suppose the function is monotone increasing on . Prove that the mean-squared errors satisfy
Paper 4, Section II, J
Part II, 2016 commentConsider a decision problem with parameter space . Define the concepts of a Bayes decision rule and of a least favourable prior.
Suppose is a prior distribution on such that the Bayes risk of the Bayes rule equals , where is the risk function associated to the decision problem. Prove that is least favourable.
Now consider a random variable arising from the binomial distribution , where . Construct a least favourable prior for the squared risk . [You may use without proof the fact that the Bayes rule for quadratic risk is given by the posterior mean.]
Paper 1, Section II,
Part II, 2016 commentDerive the maximum likelihood estimator based on independent observations that are identically distributed as , where the unknown parameter lies in the parameter space . Find the limiting distribution of as .
Now define
and find the limiting distribution of as .
Calculate
for the choices and . Based on the above findings, which estimator of would you prefer? Explain your answer.
[Throughout, you may use standard facts of stochastic convergence, such as the central limit theorem, provided they are clearly stated.]
Paper 2, Section II,
Part II, 2017 commentWe consider the problem of estimating in the model , where
Here is the indicator of the set , and is known. This estimation is based on a sample of i.i.d. , and we denote by the ordered sample.
(a) Compute the mean and the variance of . Construct an unbiased estimator of taking the form , where , specifying .
(b) Show that is consistent and find the limit in distribution of . Justify your answer, citing theorems that you use.
(c) Find the maximum likelihood estimator of . Compute for all real . Is unbiased?
(d) For , show that has a limit in for some . Give explicitly the value of and the limit. Why should one favour using over ?
Paper 3, Section II,
Part II, 2017 commentWe consider the problem of estimating an unknown in a statistical model where , based on i.i.d. observations whose distribution has p.d.f. .
In all the parts below you may assume that the model satisfies necessary regularity conditions.
(a) Define the score function of . Prove that has mean 0 .
(b) Define the Fisher Information . Show that it can also be expressed as
(c) Define the maximum likelihood estimator of . Give without proof the limits of and of ) (in a manner which you should specify). [Be as precise as possible when describing a distribution.]
(d) Let be a continuously differentiable function, and another estimator of such that with probability 1 . Give the limits of and of (in a manner which you should specify).
Paper 4, Section II,
Part II, 2017 commentFor the statistical model , where is a known, positive-definite matrix, we want to estimate based on i.i.d. observations with distribution .
(a) Derive the maximum likelihood estimator of . What is the distribution of ?
(b) For , construct a confidence region such that .
(c) For , compute the maximum likelihood estimator of for the following parameter spaces:
(i) .
(ii) for some unit vector .
(d) For , we want to test the null hypothesis (i.e. against the composite alternative . Compute the likelihood ratio statistic and give its distribution under the null hypothesis. Compare this result with the statement of Wilks' theorem.
Paper 1, Section II,
Part II, 2017 commentFor a positive integer , we want to estimate the parameter in the binomial statistical model , based on an observation .
(a) Compute the maximum likelihood estimator for . Show that the posterior distribution for under a uniform prior on is , and specify and . [The p.d.f. of is given by
(b) (i) For a risk function , define the risk of an estimator of , and the Bayes risk under a prior for .
(ii) Under the loss function
find a Bayes optimal estimator for the uniform prior. Give its risk as a function of .
(iii) Give a minimax optimal estimator for the loss function given above. Justify your answer.
Paper 4, Section II,
Part II, 2018 commentLet be an unknown function, twice continuously differentiable with for all . For some , we know the value and we wish to estimate its derivative . To do so, we have access to a pseudo-random number generator that gives i.i.d. uniform over , and a machine that takes input and returns , where the are i.i.d. .
(a) Explain how this setup allows us to generate independent , where the take value 1 or with probability , for any .
(b) We denote by the output . Show that for some independent
(c) Using the intuition given by the least-squares estimator, justify the use of the estimator given by
(d) Show that
Show that for some choice of parameter , this implies
Paper 3, Section II, K
Part II, 2018 commentIn the model of a Gaussian distribution in dimension , with unknown mean and known identity covariance matrix , we estimate based on a sample of i.i.d. observations drawn from .
(a) Define the Fisher information , and compute it in this model.
(b) We recall that the observed Fisher information is given by
Find the limit of , where is the maximum likelihood estimator of in this model.
(c) Define the Wald statistic and compute it. Give the limiting distribution of and explain how it can be used to design a confidence interval for .
[You may use results from the course provided that you state them clearly.]
Paper 2, Section II,
Part II, 2018 commentWe consider the model of a Gaussian distribution in dimension , with unknown mean and known identity covariance matrix . We estimate based on one observation , under the loss function
(a) Define the risk of an estimator . Compute the maximum likelihood estimator of and its risk for any .
(b) Define what an admissible estimator is. Is admissible?
(c) For any , let be the prior . Find a Bayes optimal estimator under this prior with the quadratic loss, and compute its Bayes risk.
(d) Show that is minimax.
[You may use results from the course provided that you state them clearly.]
Paper 1, Section II,
Part II, 2018 commentA scientist wishes to estimate the proportion of presence of a gene in a population of flies of size . Every fly receives a chromosome from each of its two parents, each carrying the gene with probability or the gene with probability , independently. The scientist can observe if each fly has two copies of the gene A (denoted by AA), two copies of the gene (denoted by BB) or one of each (denoted by AB). We let , and denote the number of each observation among the flies.
(a) Give the probability of each observation as a function of , denoted by , for all three values , or .
(b) For a vector , we let denote the estimator defined by
Find the unique vector such that is unbiased. Show that is a consistent estimator of .
(c) Compute the maximum likelihood estimator of in this model, denoted by . Find the limiting distribution of . [You may use results from the course, provided that you state them clearly.]
Paper 4, Section II, J
Part II, 2019 commentWe consider a statistical model .
(a) Define the maximum likelihood estimator (MLE) and the Fisher information
(b) Let and assume there exist a continuous one-to-one function and a real-valued function such that
(i) For i.i.d. from the model for some , give the limit in almost sure sense of
Give a consistent estimator of in terms of .
(ii) Assume further that and that is continuously differentiable and strictly monotone. What is the limit in distribution of . Assume too that the statistical model satisfies the usual regularity assumptions. Do you necessarily expect for all ? Why?
(iii) Propose an alternative estimator for with smaller bias than if for some with .
(iv) Further to all the assumptions in iii), assume that the MLE for is of the form
What is the link between the Fisher information at and the variance of ? What does this mean in terms of the precision of the estimator and why?
[You may use results from the course, provided you state them clearly.]
Paper 3, Section II, J
Part II, 2019 commentWe consider the exponential model , where
We observe an i.i.d. sample from the model.
(a) Compute the maximum likelihood estimator for . What is the limit in distribution of ?
(b) Consider the Bayesian setting and place a , prior for with density
where is the Gamma function satisfying for all . What is the posterior distribution for ? What is the Bayes estimator for the squared loss?
(c) Show that the Bayes estimator is consistent. What is the limiting distribution of ?
[You may use results from the course, provided you state them clearly.]
Paper 2, Section II, J
Part II, 2019 comment(a) We consider the model and an i.i.d. sample from it. Compute the expectation and variance of and check they are equal. Find the maximum likelihood estimator for and, using its form, derive the limit in distribution of .
(b) In practice, Poisson-looking data show overdispersion, i.e., the sample variance is larger than the sample expectation. For and , let ,
Show that this defines a distribution. Does it model overdispersion? Justify your answer.
(c) Let be an i.i.d. sample from . Assume is known. Find the maximum likelihood estimator for .
(d) Furthermore, assume that, for any converges in distribution to a random variable as . Suppose we wanted to test the null hypothesis that our data arises from the model in part (a). Before making any further computations, can we necessarily expect to follow a normal distribution under the null hypothesis? Explain. Check your answer by computing the appropriate distribution.
[You may use results from the course, provided you state it clearly.]
Paper 1, Section II, J
Part II, 2019 commentIn a regression problem, for a given fixed, we observe such that
for an unknown and random such that for some known .
(a) When and has rank , compute the maximum likelihood estimator for . When , what issue is there with the likelihood maximisation approach and how many maximisers of the likelihood are there (if any)?
(b) For any fixed, we consider minimising
over . Derive an expression for and show it is well defined, i.e., there is a unique minimiser for every and .
Assume and that has rank . Let and note that for some orthogonal matrix and some diagonal matrix whose diagonal entries satisfy . Assume that the columns of have mean zero.
(c) Denote the columns of by . Show that they are sample principal components, i.e., that their pairwise sample correlations are zero and that they have sample variances , respectively. [Hint: the sample covariance between and is .]
(d) Show that
Conclude that prediction is the closest point to within the subspace spanned by the normalised sample principal components of part (c).
(e) Show that
Assume for some . Conclude that prediction is approximately the closest point to within the subspace spanned by the normalised sample principal components of part (c) with the greatest variance.
Paper 1, Section II, J
Part II, 2020 commentState and prove the Cramér-Rao inequality for a real-valued parameter . [Necessary regularity conditions need not be stated.]
In a general decision problem, define what it means for a decision rule to be minimax.
Let be i.i.d. from a distribution, where . Prove carefully that is minimax for quadratic risk on .
Paper 2, Section II, J
Part II, 2020 commentConsider from a distribution with parameter . Derive the likelihood ratio test statistic for the composite hypothesis
where is the parameter space constrained by .
Prove carefully that
where is a Chi-Square distribution with one degree of freedom.
Paper 3, Section II, J
Part II, 2020 commentLet , let be a probability density function on and suppose we are given a further auxiliary conditional probability density function , on from which we can generate random draws. Consider a sequence of random variables generated as follows:
For and given , generate a new draw .
Define
where .
(i) Show that the Markov chain has invariant measure , that is, show that for all (measurable) subsets and all we have
(ii) Now suppose that is the posterior probability density function arising in a statistical model with observations and a prior distribution on . Derive a family such that in the above algorithm the acceptance probability is a function of the likelihood ratio , and for which the probability density function has covariance matrix for all .
Paper 4 , Section II, J
Part II, 2020 commentConsider drawn from a statistical model , with non-singular Fisher information matrix . For , define likelihood ratios
Next consider the probability density functions of normal distributions with corresponding likelihood ratios given by
Show that for every fixed , the random variables converge in distribution as to
[You may assume suitable regularity conditions of the model without specification, and results on uniform laws of large numbers from lectures can be used without proof.]
Paper 1, Section II, J
Part II, 2021 commentLet be random variables with joint probability density function in a statistical model .
(a) Define the Fisher information . What do we mean when we say that the Fisher information tensorises?
(b) Derive the relationship between the Fisher information and the derivative of the score function in a regular model.
(c) Consider the model defined by and
where are i.i.d. random variables, and is a known constant. Compute the Fisher information . For which values of does the Fisher information tensorise? State a lower bound on the variance of an unbiased estimator in this model.
Paper 2, Section II, J
Part II, 2021 commentLet be i.i.d. random observations taking values in with a continuous distribution function . Let for each .
(a) State the Kolmogorov-Smirnov theorem. Explain how this theorem may be used in a goodness-of-fit test for the null hypothesis , with continuous.
(b) Suppose you do not have access to the quantiles of the sampling distribution of the Kolmogorov-Smirnov test statistic. However, you are given i.i.d. samples with distribution function . Describe a test of with size exactly .
(c) Now suppose that are i.i.d. taking values in with probability density function , with . Define the density estimator
Show that for all and all ,
Paper 3, Section II, J
Part II, 2021 commentLet iid for some known and some unknown . [The gamma distribution has probability density function
and its mean and variance are and , respectively.]
(a) Find the maximum likelihood estimator for and derive the distributional limit of . [You may not use the asymptotic normality of the maximum likelihood estimator proved in the course.]
(b) Construct an asymptotic -level confidence interval for and show that it has the correct (asymptotic) coverage.
(c) Write down all the steps needed to construct a candidate to an asymptotic -level confidence interval for using the nonparametric bootstrap.
Paper 4, Section II, J
Part II, 2021 commentSuppose that , and suppose the prior on is a gamma distribution with parameters and . [Recall that has probability density function
and that its mean and variance are and , respectively. ]
(a) Find the -Bayes estimator for for the quadratic loss, and derive its quadratic risk function.
(b) Suppose we wish to estimate . Find the -Bayes estimator for for the quadratic loss, and derive its quadratic risk function. [Hint: The moment generating function of a Poisson distribution is for , and that of a Gamma distribution is for .]
(c) State a sufficient condition for an admissible estimator to be minimax, and give a proof of this fact.
(d) For each of the estimators in parts (a) and (b), is it possible to deduce using the condition in (c) that the estimator is minimax for some value of and ? Justify your answer.